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Abstract 

We perform geometrization of genetics by representing genetic in- 
formation by points of the 4-adic information space. By well known 
theorem of number theory this space can also be represented as the 
2-adic space. The process of DNA-reproduction is described by the 
action of a 4-adic (or equivalently 2-adic) dynamical system. As we 
know, the genes contain information for production of proteins. The 
genetic code is a degenerate map of codons to proteins. We model this 
map as functioning of a polynomial dynamical system. The purely 
mathematical problem under consideration is to find a dynamical sys- 
tem reproducing the degenerate structure of the genetic code. We 
present one of possible solutions of this problem. 

1 Introduction 



During last ten years there were found numerous applications of p- 
adic numbers outside the domain of number theory - in particular, 
in quantum physics, Beltrametti and Cassinelli, 1972, Volovich, 1987, 
Vladimirov et al, 1994, Khrennikov 1994, and theory of disordered 
systems, Avetisov et al, 1999a,b, 2002 a,b, Parisi and Sourlas, 2000, 
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Kozyrev et al, 2005, Khrennikov and Kozyrev, 2006 a,b. We pay at- 
tention to the series of papers, Khrennikov, 1997, 1998 a,b, 1999, 2000 
a,b, and 2004 a,b in that p-adic information space was introduced and 
applied to information theory, cognitive and social sciences, psychol- 
ogy and neurophysiology, see also Pitkanen, 1998, Khrennikov and 
Nilsson, 2004. The main distinguishing feature of encoding of infor- 
mation by p-adic numbers is the possibility to encode the hierarchical 
structure of information through the ultrametric topology on the p- 
adic tree, cf. also with Voronkov, 2002 a,b. As was mentioned, this 
possibility was explored a lot in Khrennikov, 1997, 1998 a,b, 1999, 
2000 a,b, and 2004 a,b. Recently it was pointed out that the same 
p-adic information space can be applied to mathematical modeling of 
gene expression, Dragovich, 2006, and Khrennikov 2006 a,b. 

We apply this approach to genetics. Now we present schematically 
development of this model. DNA and RNA sequences are represented 
by 4-adic numbers. Nucleotides are mapped to digits in registers of 4- 
adic numbers: thymine - T = 0, cytosine - C = 1, adenine - A = 2, and 
guanine - G = 3. The [/-nucleotide is represented (as well as T) by 00 
The DNA and RNA sequences have the natural hierarchical structure: 
letters which are located at the beginning of a chain are considered 
as more important. This hierarchical structure coincides with the 
hierarchical structure of the 4-adic tree. Such a hierarchy can also be 
encoded by the 4-adic metric. The process of DNA-reproduction is 
described by the action of a 4-adic dynamical system. As we know, 
the genes contain information for production of proteins. The genetic 
code is a degenerate map of codons to proteins. We model this map 
as functioning of a polynomial 4-adic dynamical system. Proteins are 
associated with cycles of such a dynamical system. By well known 
theorem of number theory this dynamics can also be represented in 
the 2-adic space. 



2 m-adic ultrametric spaces 

The notion of a metric space is used in many applications for de- 
scribing distances between objects. Let A be a set. A function 
p:Ixl-> R + (where R+ is the set of positive real numbers) is said 
to be a metric if it has the following properties: l)p(x,y) = iffx = 

-'As an introduction to modern problems of genetics one can use the special issue of 
Nature Collection, 2006. 
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y (non-degenerated); 2)p(x, y) = p(y,x) (symmetric); S)p(x, y) < 
p(x,z) + p(z,y) (the triangle inequality). The pair (X,p) is called a 
metric space. 

We are interested in the following class of metric spaces (X, p) . 
Every point x has the infinite number of coordinates 

x = (a%, ...,a n , ...) . (1) 

Each coordinate yields the finite number of values, 

a G A m = {0, ...,m - 1}, (2) 

where m > 1 is a natural number, the base of the alphabet A m . The 
metric p should be so called ultrametric, i.e., satisfy the strong triangle 
inequality: 

p(x, y) < max[p(x, z) ,p(z,y)], x,y,z G X. (3) 

The strong triangle inequality can be stated geometrically: each side 
of a triangle is at most as long as the longest one of the two other 
sides. It is impossible to imagine such a 'triangle' in the ordinary 
Euclidean space. 

We denote the space of sequences flU, © by the symbol Z m . The 
standard ultrametric is introduced on this set in the following way. 
For two points 

x = (a ,ai,a2,—.,a n , ),y = (/3 , Pi, fa, —, Pn, ■••) G Z m , 

we set 

Pmix,y) = —r if atj = (3 j: j = 0, 1, k - 1, and a k ^f3 k . 

m K 

This is a metric and even an ultrametric. To find the distance p m (x, y) 
between two strings of digits x and y we have to find the first position 
k such that strings have different digits at this position. 

Geometrically we can imagine a system of m-adic integers (which 
will be the mathematical basis of our cognitive models) as a homo- 
geneous tree with m-branches splitting at each vertex. The distance 
between mental states is determined by the length of their common 
root: close mental states have a long common root. The corresponding 
geometry strongly differs from the ordinary Euclidean geometry. 
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Figure 1: The 2-adic tree 

Let (X,p) be an arbitrary ultrametric space. For r G R + ,a G X, 
we set 

U r (a) = {x G X : p(x,a) < r}, U~ (a) = {x G X : p(x,a) < r}. 

These are balls of radius r with center a. Balls have the following 
properties: 

1) Let U and V be two balls in X. Then there are only two possi- 
bilities: (a) balls are ordered by inclusion (i.e., U C V or V C U); (b) 
balls are disjoint □ 

2) Each point of a ball may serve as a centre. 

3) In some ultrametric spaces a ball may have infinitely many radii. 
Let m > 1 be the fixed natural number. We consider the m-adic 

metric space (Z m ,p m ). This metric space has the natural algebraic 
structure. 

A point x = (ao,ai,a2, a n , ••••) of the space Z m can identified 
with a so called m-adic number: 

x = aoai...Q!fc.... = ao + aim + ... + aktn k + ... . (4) 

The series (j4]) converges in the metric space Z m . In particular, a finite 
string x = a.QOi\...ak can be identified with the natural number 

k 

x = ao + aim + ... + a^m . 

It is possible to introduce algebraic operations on the set of m-adic 
numbers Z m , namely addition, subtraction, and multiplication. These 

2 There is the third possibility in the Euclidean space . 
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operations are natural extensions by the m-adic continuity of the stan- 
dard operations on the set of natural numbers N = {0, 1, 2, 3, ...}. 

3 Mental information space 

We shall use the following mathematical model for mental information 
space: 

(1) Set-structure: The set of mental states X mcnta \ has the struc- 
ture of the m-adic tree: X menta i = Z m . 

(2) Topology: Two mental states x and y are close if they have suf- 
ficiently long common root. This topology is described by the metric 
Pm- 

In our mathematical model mental space is represented as the met- 
ric space (Z 

m j Pm ) • 

4 Genetic information space 

Genetic information space arises as a special case of mental informa- 
tion spaces. 

4.1 4-adic encoding of nucleotides 

We shall use the following mathematical model for genetic information 
space. We choose the 4-adic representation for DNA and RNA: 

T = 0,C = 1,A = 2,G = 3 

and 

U = 0,C = 1,A = 2,G = 3. 

An arbitrary gene in a DNA-sequence is encoded by a 4-adic integer, 
for example: 

ATCGTA... -> 201302... = 2 + 4 2 + 3 • 4 3 + 2 • 4 5 + ... 

Of course, biologically realizable sequences are finite (but very 
long). Thus they correspond to natural numbers. But in a mathe- 
matical model we can use even infinitely long genetic sequences. De- 
note this space by the symbol X genet i c . This space has the following 
distinguishing features: 
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(a) Set- structure: The set of genetic states -Xg en etic has the struc- 
ture of the 4-adic tree: -Xgenetic = Z4. 

(b) Topology: Two genetic states x and y are close if they have 
sufficiently long common rootH This topology is described by the 
metric p^. 

(c) Dynamics: Information processing on the level of genetic states 
is described by 4-adic dynamical systems. In the simplest case of the 
discrete-time dynamics these are iteration of a map 

/ : Z4 — > Z4. 

(d) Hierarchical structure: The coding system which is used our 
model for recording vectors of information generates a hierarchical 
structure between digits of these vectors - between nucleotides in the 
gene-sequence. Thus if x = (a±, 02, a n , ...), ctj = 0,1,2,3 is an in- 
formation vector which presents genetic information then digits ay 
have different weights. The digit ao is the most important, ac\ domi- 
nates over a2, ...,««, and so on. 

4.2 Transcription- map 

Transcription is the process of copying a gene into RNA. This is the 
first step of turning a gene into protein (although not all transcrip- 
tions lead to proteins). In our coding system transcription is simply 
the identity map from Z4 — ► Z4 (since the T and U nucleotides are 
represented by the same digit). 

5 Genetic code 

5.1 Encoding of proteins by codons 

In the genetic code proteins are encoded by codons - blocks of the 
length 3 in the gene transcription. Each codon contains information 
for producing of a single amino acid. By using our 4-adic coding sys- 
tem we can rewrite the table of the genetic code, see, e.g., Wikipedia, 
2006. We collect amino acids in families with respect to a number of 
codons which are used to encode an amino acid: 

3 Thus the first SNP (single nucleotide polymorphism) distinguishes two genetic states. 
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(1) . Met: 203; Trp: 033; 

(2) . Asn: 220, 221; Asp: 320, 321; Cys: 030, 031; Gin: 122, 123; 
Glu: 322, 323; His: 120, 121; Lys: 222, 223; Phe: 000, 001; 
Tyr: 020, 021; 

(3) He: 200, 201, 202; Stop: 023, 032, 022; 

(4) . Ala: 310, 311, 312, 313; Gly: 330, 331, 332, 333; 
Pro: 110, 111, 112, 113; Thr: 210, 211, 212, 213; 
Val: 300, 301, 302, 303; 

(6). Arg: 130, 131, 132, 133, 232, 233; 
Leu: 002, 003, 100, 101, 102, 103; 

Ser: 010, Oil, 012, 023, 230, 231; 

5.2 Codon-map 

First we consider the standard left-shift: 

s/(aoaiCK2---) = oi\OL2... 

We also consider the following cutoff-map 

03(000102-..) = 000102- 

Then the representation by codons of the gene-expression is given by 
the C3-projections of the iterations of the left-shift: 

X n = C 3 (sf (n_1) (x)). 

6 2-adic encoding of proteins 

The 4-adic encoding can be easily transformed into the 2-adic encoding 
just by using the 2-adic representation of the genetic alphabet: 
2-adic code: U=00, A=01, C=10, G=ll: 

We again collect amino acids in families with respect to a number 
of codons which are used to encode an amino acid: 

(1) . Met: 010011; Trp: 001111; 

(2) . Asn: 010100, 010110; Asp: 110100, 110110; Cys: 001100, 
001110; 

Gin: 100101, 100111; Glu: 110101, 110111; His: 100100, 100110; 
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Lys: 010101, 010111; Phe: 000000, 000010; Tyr: 000100, 000110; 

(3) . He: 010000, 010010, 010001; Stop: 000111, 001101, 000101; 

(4) . Ala: 110100, 110101, 110110, 110111; 
Gly: 111100, 111110, 111101, 111111; 
Pro: 101000, 101010, 101001, 101011; 
Thr: 011000, 011010, 011001, 011011; 
Val: 110000, 110010, 110001, 110011; 

(6). Arg: 011100, 011101, 011101, 011111, 101110, 101111; 
Leu: 000001, 000011, 100000, 100010, 100001, 100011; 
Ser: 001000, 001010, 001001, 000111, 011100, 011110. 



7 Dynamical model for degeneracy of 
the genetic code 

We shall use study dynamical systems corresponding to maps: 

Z m -> Z m ,x -> f(x). (5) 

As usual, we study the behaviour of iterations x n = / ra (xo),xo G Z p , 
where f n (x) = /o...o/(x) = /(....(/(/(x)...), the result of n successive 
applications of the map /. We shall use the standard terminology of 
the theory of dynamical systems. If J(xq) = xq then xo is a fixed 
point. If x n = xq for some n = 1,2,... we say that xo is a periodic 
point. If n is the smallest natural number with this property then n 
is said to be the period of xq. We denote the corresponding cycle by 
7 = (xo, x±, x n -i). In particular, the fixed point xo is the periodic 
point of period 1. Obviously xo is a fixed point of the iterated map 
f n if xo is a periodic point of period n. 

Simplest dynamical laws are given by monomial functions / s (x) = 
x s , s = 2, 3, ... (each branch of the p-adic tree is multiplied by itself s 
times producing a new branch). 

Our basic idea is associate with the genetic code some polynomial 

/genetic(^) = a + a\x + ... + a n x n , X <E Z m , 

where depending on the choice of the coding system m = 4, 2. 

Such a polynomial encodes amino acids in the following way. The 
set of codons (which are considered as 2-adic numbers) is split by this 
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polynomial into groups of cycles. Each cycle encodes one amino acid, 
so: 

Amino acids are coded by cycles of this polynomial. 

Our model cannot explain the origin of such a coding polynomial. 
Its origin can be a consequence of biological evolution or just purely 
information features of the genetic system. Since we do not know the 
(e.g., biological) background inducing a coding polynomial /genetic^)- 
We are not able to choose it in the unique way. In this note we 
propose one of possible solutions of the problem of finding of a coding 
polynomial. 

We shall use the well known Mahler's polynomials. To proceed in 
this way, we choose the 2-adic genetic coding. The m = 2 is a prime 
number and the system of 2-adic integers Z2 can be extended to the 
field of 2-adic numbers Q2. We recall that in a number field one can use 
all arithmetic operations: addition, subtraction, multiplication and 
division. We need these operations to define a Mahler's polynomial 
(the main problem is division). It would be a map /genetic : Z2 — > Q2 
having the structure of cycles corresponding to the genetic code of 
amino acids. 

Let we know values of some function / : Z2 — > Q2 in points j = 
0, 1, ...,n. Then its nth Mahler coefficient is defined by 




The corresponding Mahler's polynomial has the form: 

n 

F n(x) = ^2a k 

where the binomial polynomial 

/ x\ x(x — l)(x — 2)...(x — k + 1) 
\k) = fci 

The crucial is that 

f(j) = F n (j),j = 0,1,. ..,n. 

Coming back to the genetic code, we see that there are 64 different 
points-codons. Thus we need a Mahler polynomial of degree 63 such 
that 
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Met : f(010011) = 010011; Trp : f(001111) = 001111; 
Asn :f(010100) = 010110, f(010110) = 010100; 
Asp :f(110100) = 110110, f(110110) = 110100,..., 



Ser :f(001000) = 001010, f(001010) = 001001, f(001001) = 000111, 
/(000111) = 011100, /(011100) = 011110, /(011110) = 001000. 

8 Representation of gene code through 
dynamics of fuzzy cycles 

By using the 2-adic coding we can represent each codon with a 2-adic 
ball of the radius r = 1/64 with center in the corresponding 2-adic 
word. For example, 010011 -> f7 1/64 (010011). This is the set of all 2- 
adic sequences such that the first sixth digits coincides with the codon 
word 010011. Thus the amino acid Met can be represented by the ball 
U 1/64 (010011) and Trp by £/ 1/64 (001111). But Asn by the union of 
two balls: U 1/64 (010100) U J7 1/64 (010110) and, e.g., Ser by the union 
of sixth balls 

U 1/64 (001000) U U 1/64 (001010) U U 1/64 (001001) 

UC/ 1/64 (000111) U £/ 1/64 (011100) U C/ 1/64 (011110). 

We remark that in Dragovich, 2006, there was considered a 5-adic 
model to explain the origin of the gene code. In this model 5-adic 
balls were used to classify codons. 

In Khrennikov, 1997, there were also considered fuzzy cycles, cycles 
of balls, 

U ri (a 4 ) -> U r2 (a 2 ) -►•■■-> U rk (a k ) -> U ri (ai). 

We can easily define the notion of attractor fuzzy cycle and Siegel 
fuzzy cycle. The basin of attraction of a fuzzy cycle is a set of all 
points which are attracted by such a cycle. 

As we have seen in Dubischar et al, 1999, and Khrennikov and 
Nilsson, 2004, consideration of fuzzy cycles is a more natural, since 
they are stable with respect to noise (ordinary cycles can be easily 
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disturbed by noisy perturbations). Now we consider a model in that 
the "genetic polynomial" /genetic (#) encodes amino acids in the fol- 
lowing way: Amino acids are coded by fuzzy cycles of this polynomial. 
However, at the moment we do not have mathematical examples of 
simple polynomials having the structure of fuzzy cycles corresponding 
to the genetic code. We shall continue the study of this problem in 
one of our following papers. 

Conclusion. We have seen that the genetic code has a natural 
4-adic (or 2-adic) structure. Gene expression could be coupled to a 
dynamical system in the genetic information space. 
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